Automatic Pronoun Resolution for Swedish Automatic Pronoun Resolution for Swedish
نویسندگان
چکیده
This thesis describes SwePron, an algorithm for automatic resolution of pronouns in Swedish, using Mitkov’s algorithm for pronoun resolution in English (MARS) as a starting point. There are several changes made to the algorithm in order to adapt it to the Swedish language. One key language difference is that Swedish has two “neutral” genders, whereas English only has one. A second important difference is that the word order varies more in Swedish than it does in English. A number of other modifications, not directly related to language differences, are also investigated. It is argued that incorporating lexical information is an important element in increasing the accuracy of the algorithm. One successful example is the use of lists of “human role nouns”, i.e. nouns that normally refer to human beings, although this fact is not reflected in the grammatical gender. However, other aspects of lexical information are also used. Several attempts are made to utilize collocation information to improve the accuracy of the algorithm. The results are disappointing, but this is probably caused by the text database not being large enough. Furthermore, machine learning techniques could be used to improve the evaluation of the collocation information. It is argued that the emphasis on the distance between the anaphor and its antecedent should probably be increased compared to what it is in MARS. The algorithm is implemented in Java. It makes use of two existing applications, Granska ́s Text Analyzer (GTA) and MaltParser/sweMalt. These two applications together provide the syntactic parsing information that is used by the algorithm. The implemented algorithm is evaluated for third person singular pronouns. It will be quite straightforward to extend the algorithm to also resolve plural pronouns, and some guidelines for this are given. The performance of SwePron seems to be roughly similar to that of MARS, although a detailed comparison is difficult due to different coverage, and also differences in syntactic parsing information. It is suggested that some of the modifications tried in this thesis could perhaps be evaluated also on pronoun resolution for English; two options are the human role noun list, and increasing the weight of the referential distance factor.
منابع مشابه
Pronoun resolution and discourse models.
Psychological investigations of pronoun resolution have implicitly assumed that the processes involved automatically provide a unique referent for every pronoun. We challenge this assumption and propose a new framework for studying pronoun resolution. Drawing on advances in discourse representation and global memory modeling, this framework suggests that automatic processes may not always ident...
متن کاملHow Far Are We From (Semi-)Automatic Of Anaphoric Links In Corpora?
The paper raises for discussion a proposal for the semi-automatic annotation of pronoun-antecedent pairs in corpora. The proposal is based on robust knowledge-poor pronoun resolution followed by post-editing. The paper is structured as follows. The introduction comments on the fact that automatic identification of referential links in corpora has lagged behind in comparison with similar lexical...
متن کاملActive search for antecedents in cataphoric pronoun resolution
Cataphoric dependencies where a pronoun precedes its antecedent appear to call on different mechanisms in language comprehension from forward dependencies where the antecedent precedes the pronoun. Previous research has shown that the resolution of cataphoric dependencies involves predictive processes such as the active search mechanism, which hypothesizes the automatic search for an antecedent...
متن کاملDoing Dutch Pronouns Automatically in Optimality Theory
Pronoun resolution algorithms often use elaborate and complicated rules and weighted factors. In this paper I will use the framework of Optimality Theory to implement an automatic pronoun resolution system for Dutch. By ranking constraints, Optimality Theory can be used to model complex behaviour and preferences, whilst keeping the constraints clear and simple. The system is developed by quanta...
متن کاملLa reconnaissance automatique de la fonction des pronoms démonstratifs en langue arabe (Automatic recognition of demonstrative pronouns function in Arabic) [in French]
________________________________________________________________________________________________________ Automatic recognition of demonstrative pronouns function in Arabic Anaphora resolution is one of the most difficult tasks in NLP. Classifying pronouns before attempting a task of anaphora resolution is important because to handle the cataphoric pronoun, the system should determine the antece...
متن کامل